Add rake task to update missing replacement links #9701
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Context
Multiple Zendesk tickets related to whitehall assets being live when they should not be, triggered an analysis of the problematic data - PR. We've investigated the assets further, extracting a batch of assets that should not be live, because they should be replaced.
A subset of these assets correspond to a now identified bug where if the user replaces an attachment on a new draft, without having saved the edition draft first, the supposedly replaced original asset remains live, because it fails to redirect to its replacement. The data extraction rake task looped through all
AttachmentData
's where theredeleted?
method is false, that had a replacement set inwhitehall
, but no corresponding replacement value set on their assets inasset-manager
. The resulting subset of assets that have this issue can be found here.This PR provides a rake task to address the assets that are likely a result of this bug, by populating their replacement asset in
asset-manager
. It takeswhitehall
as a source of truth, relying on theAttachmentData
andAsset
objects to deduce the value of thereplacement_id
field.This is only a data patch solution. A fix for the issue will be implemented separately.
The rake task
This task expects a
csv
file containing, on each row, theAttachmentData
ID, theAsset
ID (fromasset-manager
), and the asset'svariant
(i.e., "original" or "thumbnail").We originally tried to run the the
AssetManager::AttachmentUpdater.replace(replaced_attachment_data)
line on a selection ofAttachmentData
IDs, but that only linked up the corresponding assets to their immediate successors, which were likely in draft. There is some logic inasset-manager
that ensures all assets in the chain point to the last one, after a publish event. The data we're working with is now superseded and it's difficult to put it back to normal using the common flows of the code, so we replicated that logic for the rake task:AttachmentData
does not have a replacement, we skip. The reported asset IDs we're running this on were extracted based on whether areplaced_by_id
field was set. Nonetheless, some of thoseAttachmentData
s are deleted.AttachmentData
in the chain.variant
value passed through in thecsv
).asset-manager
, logging any errors.We expect the end of the line replacement to either still be live, or a broken asset (for example a draft + deleted one, which corresponds to another bug we know of) that will be taken care of in the next rake task. We will be running this rake task first, to ensure we have a healthy replacement chain, then catch all other issue with the next rake task. There are a few errors in the trial run logs, including
unprocessable content
andparent document url
validation, but we expect any problematic data to be caught by the next rake task; the fewnot found
errors are assets that have already been deleted (perhaps manual intervention), which means they are not a problem anymore.Links
Trello card
Assets we fixed
Data extraction/reporting rake task
Assets investigation document
Co-authored-by @minhngocd